Accuracy of Genomic prediction for fleece traits in Inner Mongolia Cashmere goats

The fleece traits are important economic traits of goats. With the reduction of sequencing and genotyping cost and the improvement of related technologies, genomic selection for goats has become possible. The research collect pedigree, phenotype and genotype information of 2299 Inner Mongolia Cashmere goats (IMCGs) individuals. We estimate fixed effects, and compare the estimates of variance components, heritability and genomic predictive ability of fleece traits in IMCGs when using the pedigree based Best Linear Unbiased Prediction (ABLUP), Genomic BLUP (GBLUP) or single-step GBLUP (ssGBLUP). The fleece traits considered are cashmere production (CP), cashmere diameter (CD), cashmere length (CL) and fiber length (FL). It was found that year of production, sex, herd and individual ages had highly significant effects on the four fleece traits (P < 0.01). All of these factors should be considered when the genetic parameters of fleece traits in IMCGs are evaluated. The heritabilities of FL, CL, CP and CD with ABLUP, GBLUP and ssGBLUP methods were 0.26 ~ 0.31, 0.05 ~ 0.08, 0.15 ~ 0.20 and 0.22 ~ 0.28, respectively. Therefore, it can be inferred that the genetic progress of CL is relatively slow. The predictive ability of fleece traits in IMCGs with GBLUP (56.18% to 69.06%) and ssGBLUP methods (66.82% to 73.70%) was significantly higher than that of ABLUP (36.73% to 41.25%). For the ssGBLUP method is significantly (29% ~ 33%) higher than that with ABLUP, and which is slightly (4% ~ 14%) higher than that of GBLUP. The ssGBLUP will be as an superiors method for using genomic selection of fleece traits in Inner Mongolia Cashmere goats. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-024-10249-7.


Introduction
By the end of 2021, the number of goats in stock has been up to 133.32 million in China, and the cashmere yield was about 15,102.18tons (http:// www.stats.gov.cn).Inner Mongolia Cashmere goats (IMCGs) is an important breed of cashmere goats in China, which is famous with its high cashmere yield and excellence quality of cashmere.It is a dual-purpose breed producing cashmere and meat.Reducing cashmere diameter and increasing cashmere yield are the breeding objectives for IMCGs.With the development of quantitative genetics and molecular biology, the selection methods of livestock have been improved gradually [1].A central methodology is the BLUP method proposed by Henderson in 1975 [2].Here, genetics parameters can be estimated based on the so-called mixed model equations in which covariance matrices need to be defined.In the standard approach, the pedigree-based relationship matrix (A) is used and the method is referred to as ABLUP.Several studies have demonstrated that using BLUP method can achieve higher genetic gains in pigs compared to individual phenotype selection [3,4].Using BLUP method to estimate the breeding value of litter size traits of Landrace pigs, which indicated that selection by BLUP method is feasible for the improvement of the litter size of swine [5].Jang et al.(2019) assessed that the effect of progeny numbers and pedigree depth on the accuracy of the estimated breeding value (EBV) of Hanwoo beef using BLUP method, the results showed that EBV can show more precise outcome with more progenies [6].
In 2001, the idea of genomic selection (GS) was proposed by Meuwissen.The method can improve estimation accuracy of breeding value, increase genetic gain, in particular by shortening the generation interval and reduce breeding costs [7][8][9][10].Genomic best linear unbiased prediction (GBLUP) utilizes genomic relationships to estimate the genetic merit of an individual [11,12].The genomic relationship matrix (G) defines the covariance between individuals based on observed similarity at the genomic level, rather than on expected similarity A based on pedigree.Thus, more accurate predictions of merit can be obtained.The GBLUP method assigns the same variance to all loci and essentially treats them all as equally important.The single-step genomic BLUP (ssG-BLUP) was provided by Legarra et al. [13].The core idea of ssGBLUP method is to combine pedigree relationship matrix (A) and genomic relationship matrix (G) to reconstruct a new relationship matrix (H) [14][15][16][17].Both, GBLUP as well as ssGBLUP use the same equations as ABLUP, but with different covariance, that is relationship matrices.
This approach is beneficial for traits that are difficult to measure and traits with low heritability.It has been successfully applied to other livestock, such as dairy cattle [18], beef [19], pigs [20], chickens [21], and sheep [22].It demonstrated that accuracies of breeding values for milk fatty acid of dairy cattle were low to high, ranging from 0.13 to 0.72 and from 0.18 to 0.74 considering the pedigree and the genomic information, respectively.It was confirmed that the contribution of genomic information in milk yield is more accurate compared to the ABLUP methodology [18].Zhao (2019) estimated genetic parameters and conducted genomic prediction for five types of sperm morphology abnormalities in a large Duroc boar population by using GBLUP and ssGBLUP method.It showed that the comparative predictive abilities of breeding values with ssGBLUP outperformed that with GBLUP method [20].Zhu (2021) evaluated the effect of statistical model, heritability and marker density on genomic prediction of six wool traits of sheep.The results showed that the prediction ability of GBLUP model for traits with low heritability was better [22].Muir (2015) reported that the accuracy of GEBV was higher than that estimated by using ABLUP method with simulated data when the enough training generations were provided [23].
Genomic selection has been widely applied in animal breeding programs.However, due to the limitations of sequencing costs, and economic benefits, the application of genome selection for goats has not yet fully developed.With the construction of reference populations and the development of 70 K commercial SNP genotyping chips for goats, a routine application of GS is in sight.In this study, the records of phenotype, genotype, pedigree of 2299 IMCGs was used.Genomic prediction of fleece traits in Inner Mongolia Cashmere goats (IMCGs) using the pedigree based Best Linear Unbiased Prediction (BLUP), Genomic BLUP (GBLUP), single-step GBLUP (ssGBLUP) were performed.This study will provide a reference for genome selection breeding of Inner Mongolia Cashmere goats.

Phenotypic data
The phenotypic data were collected from an Inner Mongolia YiWei White Cashmere Goat Limited Liability Company, Wulan Town, Etuoke Banner, Ordos City, Inner Mongolia Autonomous Region, China (39°12′N; 107°97′E).In this study, a total of 33,623 production performance records of fleece traits for 2256 individuals (372 males and 1884 females) at ages of 1 to 8 years old were collected from 2011 to 2021.All animal pedigree can be traced back three generations.The fleece traits included cashmere production (CP), cashmere diameter (CD) and cashmere length (CL), fiber length (FL).The basic statistics of phenotypic data were analyzed with Microsoft Excel 2021 (https:// www.micro soft.com/ zh-cn/ micro soft-365/ excel) and R4.2.2 (https:// www.r-proje ct.org/).

Genotype data
The 2299 individuals were genotyped using the Illumina GGP_Goat_70K BeadChip (Illumina, San Diego, CA).Markers on the sex chromosome were discarded.SNPs were selected based on minor allele frequency (MAF > 0.05), proportion of missing genotypes (missing < 0.05), and Hardy Weinberg equilibrium (HWE > 10 -5 ).Unqualified SNPs were removed.Moreover, individuals with more than 10% missing genotypes were excluded.Use PLINK1.9 software to perform quality control on genotype data.The genotype data after quality control was utilized to draw the SNP density maps by CMplot packages in R language.

Estimation of genetic parameters and genomic breeding value
In this study, the fixed effects including sex, year of production, herd (1 to 11), individual age, dam age, birth type were determined by generalized linear model (GLM).The generalized linear model formula was as follow: where y ijklmno is the vector of observations of the ani- mal, µ is the mean value vector of the observations, S i is the effect of sex, Y j is the effect of year of production, H k is the effect of herd, I l is the effect of individual age, D m is the effect of dams of age, B n is the effect of birth type, e ijklmno is the effect of residual.
After determining the fixed effect, a repeatability animal model was used to estimate the genetic parameters and genomic breeding values with ABLUP, GBLUP and ssGBLUP methods.All methods were performed by the ASREML software [24].
In this study, the model was the same for ABLUP, GBLUP and ssGBLUP: where y is the vector of the observations, µ is the mean value vector of the observations, b is the vector of fixed effects, a is a vector of additive genetic effects, c is a vec- tor of permanent environmental effects and e is a vector of residual.The matrix X is the incidence matrix for the fixed effects, Z is the incidence matrix relating additive genetic effects and W is the incidence matrix relating permanent environmental effects.
In ABLUP, additive genetic effects are sampled from distribution N (0, Aσ 2 a ) ; σ 2 a is the additive genetic variance and A is the identity by descent (IBD) relationship matrix constructed from pedigree information.In GBLUP, the Construction of the inverse of H matrix for ssGBLUP: where A −1 is the inverse matrix of all pedigree rela- tions, G −1 is the inverse matrix of genome relationships, and A −1 22 is the inverse matrix of pedigree relations for the genotype individuals.

Accuracy of genetic evaluation
In this study, five-fold cross-validation was used to evaluate the accuracy of genomic prediction.Firstly, the individuals were randomly divided into five groups, and then one group was selected as the validation population at each time, and the other four groups were used as the training population.The accuracy of genomic prediction is evaluated by calculating the correlation between the estimated phenotypic value and the true phenotypic value in the validation population divided by the square root of heritability.The formula was as follow: The unbiased of genomic prediction is evaluated by the regression coefficient between the true phenotypic value and the estimated phenotypic value. (3)

Results
Basic statistical analysis of phenotypic data Minimums (Min), mean, maximum (Max), standard deviation (SD) and coefficient of variation (CV) values of the fleece traits were presented in  S1].

Analysis of genotype data
The 43 individuals and 16,360 SNPs were deleted from the raw genotype data.Finally, 2256 individuals and 50,728 markers were used to analyze.The number of SNPs on each chromosome before and after quality control were shown in Fig. 1.The SNP density after quality control were similar over 29 autosomes (Fig. 2).
(7) b y i y i = cov(y i , y i ) Var(y i )

Determination of fixed effects
The results demonstrated that year of production, sex, herd and individual ages had high significantly effect on the fleece traits (P < 0.01), however, birth type and dams of age had no effect on the fleece traits for FL, CL and CD (P > 0.05) (Table 2).Therefore, year, sex, herd and individual ages should be considered when the genetic parameters of fleece traits in IMCGs were evaluated.

Estimation of genetic parameters
The residual plots of fleece traits in each method were shown in Figure S2-S4 [See Additional file 1, Figure S2, Figure S3, Figure S4].All of these indicated that the models fit well.The variance components and genetic parameters of fleece traits in IMCGs were shown in Table 3.The heritability of FL (fiber length), CL (cashmere length), CP (cashmere production) and CD (cashmere diameter) by using ABLUP method were 0.27, 0.06, 0.15 and 0.24 respectively, and the repeatability of FL, CL, CP and CD were 0.51, 0.08, 0.35 and 0.37 respectively.The heritability of FL (fiber length), CL (cashmere length), CP (cashmere production) and CD (cashmere diameter) by using    GBLUP method were 0.31, 0.08, 0.20 and 0.28 respectively, and the repeatability of FL, CL, CP and CD were 0.48, 0.08, 0.34 and 0.36 respectively.The heritability of FL, CL, CP and CD by using ssGBLUP method were 0.26, 0.05, 0.15 and 0.22 respectively, and the repeatability of FL, CL, CP and CD were 0.39, 0.05, 0.26 and 0.24 respectively.Because genome information is considered, the heritability estimated by methods GBLUP is higher than that by ABLUP method, and the repeatability estimated is slightly lower.And the standard error of genomic based methods are lower than pedigree.

Accuracy of GEBV in each method
Akaike's An Information Criterion (AIC) and Schwarz's Bayesian criterion (BIC or SBC) are used to evaluate the effectiveness of model fitting.It was illustrated that the model by using ssGBLUP and GBLUP methods fitted better than that by using ABLUP methods [See Additional file 1, Figure S5].The accuracy of GEBV by using GBLUP and ssGBLUP methods were shown in Table 4 and Fig. 3.
The results demonstrated that the prediction accuracy of four fleece traits by using ssGBLUP and GBLUP were significantly higher than that using ABLUP.The range of predict ability of the fleece traits by using ABLUP, GBLUP and ssGBLUP range are 36.73%~ 41.25%, 56.18% ~ 69.06%, 66.82% ~ 73.70%, respectively.There was no significant difference in prediction accuracy between the GBLUP and ssGBLUP methods for the other three personality traits, except for CL.Numerically speaking, the prediction accuracy of fleece traits in ssGBLUP method is slightly higher than that in GBLUP method.

Discussion
In this study, the results that sex, year, herd and animal age had highly significant effect on the fleece traits in IMCGs, which is similar to the findings in most studies.
Wang (2013) reported that the year of production, sex and herd had highly significant influences on all fleece traits [26].Salehi (2010) was to evaluate effect of some environmental factors on fiber characteristics of Raeini Cashmere goats, and the results of this study indicated that the fixed effects including age and sex should be considered in the breeding programs [27].It may be explained by differences in rearing conditions, rainfall, and quality of grassland.In this study, that the results demonstrated that dams of age and birth type had no significant effect on fleece traits of Inner Mongolia Cashmere goats.Newman (1996) found that the dams of age had no significant effect on cashmere diameter and cashmere length on New Zealand cashmere goats [28].Snyman reported the non-genetic factors affecting the growth and fleece traits of Afrino sheep, in which the dams of age had no significant effect on fiber diameter [29].Bromley used REML method to estimate the genetic parameters of prolificacy, weight and wool traits of Columbia, Polypay, Rambouillet and Targhee sheep, which illustrated that birth type had no significant effect on fleece traits [30].However, Zhou reported that  the birth type had significant impact on yearling cashmere length, but had no significant impact on cashmere diameter, it is inconsistent with our results [31].This may be due to the data collection time and the size of the phenotypic data set.Therefore, year, sex, herd and individual ages should be considered when the genetic parameters of fleece traits in IMCGs were evaluated.Many methods, including GBLUP, ssGBLUP and Bayesian methods, have been used to perform genomic selection in plants and animals.To some extent, the methods affected the accuracy of the prediction accuracy.The results in this study show that the estimation accuracy of ssGBLUP and GBLUP is significantly higher than that of ABLUP method.It is basically consistent with that in other studies [32,33].Mrode (2021) reported that the estimates of heritability for daily milk yield from GBLUP and ssGBLUP were essentially the same [34], which is similar to this study.Lourenco reported that prediction accuracy of GEBV for growth traits and calving ease when using single-step genomic BLUP (ssGBLUP) in Angus cattle was higher than that in using BLUP [35].Teissier (2019) illustrated that the accuracy of GEBV for milk production traits, udder type traits, and somatic cell scores in French dairy goats was higher than that using other methods.Similarly, the accuracy of GEBV in ssGBLUP for fiber diameter and live body weight was higher than that with other methods in our study [36].Wei (2020) compared estimates of genetic parameters and the accuracy of breeding values for wool traits in Merino sheep between pedigree-based best linear unbiased prediction and single-step genomic best linear unbiased prediction, the results showed that the heritability of wool traits with ssGBLUP were slightly higher than those obtained with pedigree-based best linear unbiased prediction [37].The accuracies of estimated breeding values were low to moderate, ranging from 0.362 to 0.573 for the whole population.Compared with ABLUP, GBLUP and ssGBLUP has relatively better prediction ability.Therefore, it is suggested to use ssGBLUP method for genome selection of goats.With the continuous progress of breeding work, more efficient and simple models will be optimized and developed.Applying these methods to perform genomic selection of important traits in livestock and poultry will inevitably accelerate the breeding process of population.

Conclusions
In this study, the genetic parameters and genomic breeding values of fleece traits in IMCGs were estimated by using ABLUP, GBLUP and ssGBLUP methods.Regardless of which method is used, the heritability of cashmere length is low, while the heritability of other three fleece traits are medium or low to medium.The prediction accuracy of GEBV for fleece traits by using GBLUP and ssGBLUP is significantly higher than that with ABLUP method.And the prediction accuracy of fleece traits in ssGBLUP method is slightly higher than that in GBLUP method.The accuracy of GEBV with ssGBLUP method for fleece traits ranged from 66.82% to 73.70%, which is 29.03%-33.97%higher than that with ABLUP method.Therefore, ssGBLUP is recommended as the method of genetic evaluation of fleece traits in IMCGs.

( 2 )
y = µ + Xb + Za + Wc + e matrix relating to additive genetic effects for the genomic relationship matrix (G)[12,25]:In ssGBLUP, the matrix relating additive genetic effects for H matrix: Here,the individuals are divided into two parts: Part 1 contains the individuals whose genotype is not available and Part 2 consists of the phenotype individuals.Thus, A 11 denotes the entries of A that provide the relationships within Part 1, A 12 and A 21 the relation- ships between the individuals of the two parts, and A 22 the pedigree relationships within Part 2.Moreover, A −1 22 denotes the inverse ofA 22 .

Fig. 2
Fig.2The distribution of SNP density on each chromosome

Fig. 3
Fig. 3 Comparison of the accuracy of GEBV for fleece traits with three methods

Table 1 .
The averages values of four fleece traits including fiber length, cashmere length, cashmere production, cashmere diameter is 18.89 cm, 6.23 cm, 740.3 g and 15.23 μm, respectively.And the corresponding coefficient of variation were 25.94%, 17.60%, 29.07%and 5.32%.The four fleece traits approximately follow a normal distribution [See Additional file 1, Figure

Table 1
The basic statistics of phenotype values of fleece traits in IMCGs FL Fiber Length, CL Cashmere Length, CP Cashmere Production, CD Cashmere Diameter Fig.1Comparison of SNP numbers on each chromosome before and after quality control

Table 2
The fixed effects of fleece traits in IMCGs P < 0.01: the difference is extremely significant; P < 0.05: the difference is significant; P > 0.05: the difference is not significant; DF Degree of Freedom, SS Sum of Square, MS Mean Square

Table 3
Estimation of genetic parameters of fleece traits in IMCGs σ 2 a the additive genetic effects variance, σ 2 c the permanent environmental effects variance, σ 2 e the residual effects variance, SE Standard error, h 2 heritability, rep repeatability

Table 4
The accuracy and unbiased of GEBV for fleece traits with three methods Note: a, b represents significant differences.The difference is significant with different letters